-
Notifications
You must be signed in to change notification settings - Fork 239
feat: add mcp classification server doc and example embedding based mcp classification server #417
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…cp classification server Signed-off-by: Huamin Chen <[email protected]>
✅ Deploy Preview for vllm-semantic-router ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR implements MCP classification server support for semantic router by adding comprehensive documentation and a production-ready embedding-based classifier example. This enables externalized, dynamic classification logic that can be updated without router restarts.
Key changes:
- Complete MCP classification protocol specification with HTTP/stdio transport modes
- Overview documentation explaining architecture, benefits, and implementation options
- Production-ready embedding classifier using Qwen3-Embedding-0.6B model with FAISS vector search
- Training dataset with 95 labeled examples across 5 categories
Reviewed Changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 5 comments.
Show a summary per file
File | Description |
---|---|
website/docs/tutorials/mcp-classification/protocol.md |
Comprehensive protocol specification for MCP classification servers |
website/docs/tutorials/mcp-classification/overview.md |
Architecture overview and implementation guidance for MCP classification |
examples/mcp-classifier-server/training_data.csv |
Training dataset with categorized examples for embedding classifier |
examples/mcp-classifier-server/server_embedding.py |
Production embedding-based MCP classifier implementation |
examples/mcp-classifier-server/requirements_embedding.txt |
Dependencies for embedding-based classifier |
examples/mcp-classifier-server/README.md |
Updated documentation comparing regex and embedding implementations |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
} | ||
``` | ||
|
||
#### Response |
Copilot
AI
Oct 13, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] The JSON structure is embedded as a string within JSON, making it difficult to parse and validate. Consider showing the parsed structure directly or using a cleaner format for better readability.
Copilot uses AI. Check for mistakes.
return classifier | ||
|
||
|
||
# Initialize MCP server | ||
app = Server("embedding-classifier") |
Copilot
AI
Oct 13, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] The global classifier variable and initialization pattern could lead to issues in multi-threaded environments. Consider using a thread-safe singleton pattern or dependency injection instead of global state.
Copilot uses AI. Check for mistakes.
model_name, trust_remote_code=True | ||
) | ||
self.model = AutoModel.from_pretrained(model_name, trust_remote_code=True) | ||
self.device = torch.device("cpu") |
Copilot
AI
Oct 13, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The device is hardcoded to CPU, which will significantly impact performance if GPU is available. Consider auto-detecting GPU availability with fallback to CPU.
self.device = torch.device("cpu") | |
self.device = torch.device("cuda" if torch.cuda.is_available() else "cpu") |
Copilot uses AI. Check for mistakes.
import asyncio | ||
|
||
while True: | ||
await asyncio.sleep(3600) |
Copilot
AI
Oct 13, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The asyncio import inside the function creates unnecessary overhead. Move the import to the top of the file with other imports.
Copilot uses AI. Check for mistakes.
logger.error(f"Error handling request: {e}", exc_info=True) | ||
error = { | ||
"jsonrpc": "2.0", | ||
"id": request.get("id") if isinstance(request, dict) else None, |
Copilot
AI
Oct 13, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The variable request
shadows the aiohttp request object. This should reference the JSON data instead of the HTTP request object.
Copilot uses AI. Check for mistakes.
👥 vLLM Semantic Team NotificationThe following members have been identified for the changed files in this PR and have been automatically assigned: 📁
|
Signed-off-by: Huamin Chen <[email protected]>
Signed-off-by: Huamin Chen <[email protected]>
Signed-off-by: Huamin Chen <[email protected]>
Signed-off-by: Huamin Chen <[email protected]>
Signed-off-by: Huamin Chen <[email protected]>
Signed-off-by: Huamin Chen <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Copilot reviewed 6 out of 6 changed files in this pull request and generated 5 comments.
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
## Initialization Sequence | ||
|
||
When the semantic router starts: | ||
|
Copilot
AI
Oct 14, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] The mermaid diagram may not render correctly if the website doesn't support mermaid syntax. Consider adding a note about mermaid support or providing an alternative representation.
> **Note:** The following diagram uses [Mermaid](https://mermaid-js.github.io/mermaid/) syntax. It will only render if your documentation viewer supports Mermaid diagrams. | |
> | |
> **Text alternative:** | |
> - Router sends `POST /mcp/initialize` to MCP Server | |
> - MCP Server responds with server capabilities | |
> - Router sends `POST /mcp/tools/list` to MCP Server | |
> - MCP Server responds with available tools | |
> - Router calls `classify_text → list_categories` on MCP Server | |
> - MCP Server responds with categories and prompts | |
> - Router is ready | |
> - Router calls `classify_text → classify_text` on MCP Server | |
> - MCP Server responds with classification result |
Copilot uses AI. Check for mistakes.
# Initialize classifier globally | ||
# Note: This is safe for aiohttp as it uses a single-threaded event loop. | ||
# For multi-process deployments, each process gets its own instance. | ||
classifier = None | ||
classifier_device = "auto" # Default device setting |
Copilot
AI
Oct 14, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Global state with lazy initialization can cause race conditions if multiple threads access get_classifier() simultaneously. Consider using threading.Lock or asyncio.Lock for thread-safe initialization.
Copilot uses AI. Check for mistakes.
error = { | ||
"jsonrpc": "2.0", | ||
"id": ( | ||
data.get("id") | ||
if "data" in locals() and isinstance(data, dict) | ||
else None | ||
), |
Copilot
AI
Oct 14, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The error handling logic checks for 'data' in locals() which may not work as expected in exception handlers where 'data' might not be defined. This could cause a NameError. Consider using a try-except block around data access instead.
error = { | |
"jsonrpc": "2.0", | |
"id": ( | |
data.get("id") | |
if "data" in locals() and isinstance(data, dict) | |
else None | |
), | |
try: | |
request_id = data.get("id") if isinstance(data, dict) else None | |
except Exception: | |
request_id = None | |
error = { | |
"jsonrpc": "2.0", | |
"id": request_id, |
Copilot uses AI. Check for mistakes.
try: | ||
while True: | ||
await asyncio.sleep(3600) | ||
except KeyboardInterrupt: | ||
logger.info("Shutting down server...") |
Copilot
AI
Oct 14, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] The infinite loop with 1-hour sleep intervals is unnecessarily resource-intensive. Consider using asyncio.Event or a more efficient waiting mechanism for graceful shutdown.
try: | |
while True: | |
await asyncio.sleep(3600) | |
except KeyboardInterrupt: | |
logger.info("Shutting down server...") | |
shutdown_event = asyncio.Event() | |
try: | |
await shutdown_event.wait() | |
except KeyboardInterrupt: | |
logger.info("Shutting down server...") | |
shutdown_event.set() |
Copilot uses AI. Check for mistakes.
general,How do I improve my sleep quality?,Health and wellness | ||
general,What are some creative hobbies to try?,Arts and crafts | ||
general,How can I be more environmentally friendly?,Sustainability and lifestyle | ||
|
Copilot
AI
Oct 14, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The CSV file has an empty trailing line which could cause parsing issues in some CSV readers. Consider removing the empty line at the end of the file.
Copilot uses AI. Check for mistakes.
What type of PR is this?
Follow up #368 with example embedding based mcp classification server and documentation of how to extend the in-tree classification with mcp based approach.
What this PR does / why we need it:
Which issue(s) this PR fixes:
Fixes #
Release Notes: Yes/No